16 research outputs found
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision
We propose a novel taxonomy for bias evaluation of discriminative foundation
models, such as Contrastive Language-Pretraining (CLIP), that are used for
labeling tasks. We then systematically evaluate existing methods for mitigating
bias in these models with respect to our taxonomy. Specifically, we evaluate
OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot
classification, image retrieval and image captioning. We categorize desired
behaviors based around three axes: (i) if the task concerns humans; (ii) how
subjective the task is (i.e., how likely it is that people from a diverse range
of backgrounds would agree on a labeling); and (iii) the intended purpose of
the task and if fairness is better served by impartiality (i.e., making
decisions independent of the protected attributes) or representation (i.e.,
making decisions to maximize diversity). Finally, we provide quantitative
fairness evaluations for both binary-valued and multi-valued protected
attributes over ten diverse datasets. We find that fair PCA, a post-processing
method for fair representations, works very well for debiasing in most of the
aforementioned tasks while incurring only minor loss of performance. However,
different debiasing approaches vary in their effectiveness depending on the
task. Hence, one should choose the debiasing approach depending on the specific
use case.Comment: Accepted at AIES'2
Causal Inference on Event Sequences
Given two discrete valued time series—that is, event sequences—of length n can we tell whether they are causally related? That is, can we tell whether x^n causes y^n, whether y^n causes x^n? Can we do so without having to make assumptions on the distribution of these time series, or about the lag of the causal effect? And, importantly for practical application, can we do so accurately and efficiently? These are exactly the questions we answer in this paper.
We propose a causal inference framework for event sequences based on information theory. We build upon the well-known notion of Granger causality, and define causality in terms of compression. We infer that x^n is likely a cause of y^n if y^n can be (much) better sequentially compressed given the past of both y^n and x^n, than for the other way around. To compress the data we use the notion of sequential normalized maximal likelihood, which means we use minimax optimal codes with respect to a parametric family of distributions. To show this works in practice, we propose CUTE, a linear time method for inferring the causal direction between two event sequences. Empirical evaluation shows that CUTE works well in practice, is much more robust than transfer entropy, and ably reconstructs the ground truth on river flow and spike train data
Origo: Causal Inference by Compression
Causal inference from observational data is one of the most fundamental problems in science. In general, the task is to tell whether it is more likely that X caused Y, or vice versa, given only data over their joint distribution. In this paper we propose a general inference framework based on Kolmogorov complexity, as well as a practical and computable instantiation based on the Minimum Description Length (MDL) principle.
Simply put, we propose causal inference by compression. That is, we infer that X is a likely cause of Y if we can better compress the data by first encoding X, and then encoding Y given X, than in the other direction. To show this works in practice, we propose Origo, an efficient method for inferring the causal direction from binary data. Origo employs the lossless Pack compressor (Tatti & Vreeken, 2008) and searches for that set of decision trees that encodes the data most succinctly. Importantly, it works directly on the data and does not require assumptions about neither distributions nor the type of causal relations.
To evaluate Origo in practice, we provide extensive experiments on synthetic, benchmark, and real-world data, including three case studies. Altogether the experiments show that Origo reliably infers the correct causal direction on a wide range of settings
Accurate Causal Inference on Discrete Data
Additive Noise Models (ANMs) provide a theoretically sound approach to inferring the most likely causal direction between pairs of random variables given only a sample from their joint distribution. The key assumption is that the effect is a function of the cause, with additive noise that is independent of the cause. In many cases ANMs are identifiable. Their performance, however, hinges on the chosen dependence measure, the assumption we make on the true distribution. In this paper we propose to use Shannon entropy to measure the dependence within an ANM, which gives us a general approach by which we do not have to assume a true distribution, nor have to perform explicit significance tests during optimization. The information-theoretic formulation gives us a general, efficient, identifiable, and, as the experiments show, highly accurate method for causal inference on pairs of discrete variables-achieving (near) 100% accuracy on both synthetic and real data